add rust support#113
Conversation
|
Also, I think it would make sense to move inline static size_t aligned_size(size_t alignment, size_t size)
{
...
}Should work. Then you can remove it from |
|
I do not quite understand why cxx17 features are not available even with |
CMakeLists.txt
Outdated
| add_library(${name} ${type} ${ARGN}) | ||
| target_link_libraries(${name} snmalloc_lib) | ||
| if(MSVC) | ||
| target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)") |
There was a problem hiding this comment.
Perhaps this is not actually needed, but have it added anyway.
There was a problem hiding this comment.
I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?
There was a problem hiding this comment.
Pretty sure it's not required for a static lib
There was a problem hiding this comment.
Could you drop this bit please, and I'll merge.
There was a problem hiding this comment.
@mjp41 I went to beg by then. Sorry for the late reply.
|
So I don't know why, but disabling the line # Ensure that we do not link against C++ stdlib when compiling shims.
set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)for MSVC, fixes the C++17 compilation. Changing to # Ensure that we do not link against C++ stdlib when compiling shims.
if(NOT MSVC)
set_target_properties(${name} PROPERTIES LINKER_LANGUAGE C)
endif()should fix your build issues at least. |
|
with this as an experiment: SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
{
// Client responsible for checking alignment is not zero
assert(alignment != 0);
// Client responsible for checking alignment is not above SUPERSLAB_SIZE
assert(alignment <= SUPERSLAB_SIZE);
// Client responsible for checking alignment is a power of two
assert(bits::next_pow2(alignment) == alignment);
size = bits::max(size, alignment);
snmalloc::sizeclass_t sc = size_to_sizeclass(size);
if (sc >= NUM_SIZECLASSES)
{
// large allocs are 16M aligned, which is maximum we guarantee
return 1;
}
size_t t = 2;
for (; sc < NUM_SIZECLASSES; sc++, t++)
{
size = sizeclass_to_size(sc);
if ((size & (~size + 1)) >= alignment)
{
return t;
}
}
// Give max alignment.
return t;
}and #include <iostream>
#include "src/snmalloc.h"
#include <cstdlib>
#define LIMIT 10000000
int main() {
size_t all = 0;
size_t max = 0;
srand(time(NULL));
for(size_t i = 0; i < LIMIT; ++i) {
size_t a = snmalloc::bits::next_pow2(1 + (size_t)rand() % (snmalloc::SUPERSLAB_SIZE));
size_t m = rand();
auto t = snmalloc::aligned_size(a, m);
all += t;
max = std::max(max, t);
}
std::cout << "ave: " << (double)all / (double)LIMIT << ", max: " << max << std::endl;
}I can get: ➜ cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00569, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5
ave: 1.00572, max: 5So perhaps it is acceptable as the average case is quite good? |
|
Not sure I follow. Are you trying to work out: how much we round up the size to account for alignment? Should the following line subtract, m? auto t = snmalloc::aligned_size(a, m);Also, if auto t = snmalloc::aligned_size(a, m) - std::max(a,m); |
|
I changed the fake In most cases, the memory size should be smaller. Hence, I test again with size_t m = rand() % snmalloc::size_to_sizeclass(snmalloc::NUM_SIZECLASSES);➜ cmake-build-debug git:(master) ✗ for i in $(seq 1 10); do ./test2; done
ave: 1.49993, max: 2
ave: 1.4999, max: 2
ave: 1.4999, max: 2
ave: 1.49986, max: 2
ave: 1.49986, max: 2
ave: 1.49981, max: 2
ave: 1.49981, max: 2
ave: 1.5001, max: 2
ave: 1.5001, max: 2
ave: 1.49982, max: 2 |
|
It is more the branches on the fast path that worry me. So looking at the disasm from Clang: 0000000000000000 <rust_alloc>:
0: 48 89 f8 mov %rdi,%rax
3: 64 48 8b 3c 25 00 00 mov %fs:0x0,%rdi
a: 00 00
c: 48 39 c6 cmp %rax,%rsi
f: 48 0f 46 f0 cmovbe %rax,%rsi
13: 48 8d 4e ff lea -0x1(%rsi),%rcx
17: 48 89 ca mov %rcx,%rdx
1a: 48 81 f9 ff ff 00 00 cmp $0xffff,%rcx
21: 77 13 ja 36 <rust_alloc+0x36>
23: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx
27: 48 8b 92 00 00 00 00 mov 0x0(%rdx),%rdx
2e: 48 83 fa 4a cmp $0x4a,%rdx
32: 76 3c jbe 70 <rust_alloc+0x70>
34: eb 69 jmp 9f <rust_alloc+0x9f>
36: 48 83 ca 20 or $0x20,%rdx
3a: f3 4c 0f bd c2 lzcnt %rdx,%r8
3f: 41 b9 3a 00 00 00 mov $0x3a,%r9d
45: 45 31 d2 xor %r10d,%r10d
48: 4d 29 c1 sub %r8,%r9
4b: 41 0f 95 c2 setne %r10b
4f: 44 89 ca mov %r9d,%edx
52: 44 29 d2 sub %r10d,%edx
55: 80 c2 04 add $0x4,%dl
58: c4 e2 eb f7 d1 shrx %rdx,%rcx,%rdx
5d: 83 e2 03 and $0x3,%edx
60: 4a 8d 14 8a lea (%rdx,%r9,4),%rdx
64: 48 83 fa 4a cmp $0x4a,%rdx
68: 77 35 ja 9f <rust_alloc+0x9f>
6a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
70: 89 d1 mov %edx,%ecx
72: 83 e1 7f and $0x7f,%ecx
75: 48 8b 34 cd 00 00 00 mov 0x0(,%rcx,8),%rsi
7c: 00
7d: c4 e2 f0 f3 de blsi %rsi,%rcx
82: 48 39 c1 cmp %rax,%rcx
85: 73 14 jae 9b <rust_alloc+0x9b>
87: 48 83 c2 01 add $0x1,%rdx
8b: 48 83 fa 4a cmp $0x4a,%rdx
8f: 76 df jbe 70 <rust_alloc+0x70>
91: be 00 00 00 01 mov $0x1000000,%esi
96: e9 00 00 00 00 jmpq 9b <rust_alloc+0x9b>
9b: 48 8d 4e ff lea -0x1(%rsi),%rcx
9f: 48 81 f9 ff ff 00 00 cmp $0xffff,%rcx
a6: 77 ee ja 96 <rust_alloc+0x96>
a8: 48 83 e1 f8 and $0xfffffffffffffff8,%rcx
ac: 48 8b b1 00 00 00 00 mov 0x0(%rcx),%rsi
b3: 48 8b 04 f7 mov (%rdi,%rsi,8),%rax
b7: 48 85 c0 test %rax,%rax
ba: 74 08 je c4 <rust_alloc+0xc4>
bc: 48 8b 08 mov (%rax),%rcx
bf: 48 89 0c f7 mov %rcx,(%rdi,%rsi,8)
c3: c3 retq
c4: e9 00 00 00 00 jmpq c9 <rust_alloc+0xc9>
c9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)where as 0000000000003b60 <malloc>:
3b60: 48 89 fe mov %rdi,%rsi
3b63: 48 8b 05 66 44 22 00 mov 0x224466(%rip),%rax # 227fd0 <.got+0x8>
3b6a: 64 48 8b 38 mov %fs:(%rax),%rdi
3b6e: 48 8d 46 ff lea -0x1(%rsi),%rax
3b72: 48 3d ff ff 00 00 cmp $0xffff,%rax
3b78: 77 20 ja 3b9a <malloc+0x3a>
3b7a: 48 83 e0 f8 and $0xfffffffffffffff8,%rax
3b7e: 48 8d 0d 8b 1e 01 00 lea 0x11e8b(%rip),%rcx # 15a10 <_ZN8snmallocL18sizeclass_metadataE>
3b85: 48 8b 34 08 mov (%rax,%rcx,1),%rsi
3b89: 48 8b 04 f7 mov (%rdi,%rsi,8),%rax
3b8d: 48 85 c0 test %rax,%rax
3b90: 74 0d je 3b9f <malloc+0x3f>
3b92: 48 8b 08 mov (%rax),%rcx
3b95: 48 89 0c f7 mov %rcx,(%rdi,%rsi,8)
3b99: c3 retq
3b9a: e9 21 f2 ff ff jmpq 2dc0 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE15alloc_not_smallILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
3b9f: e9 7c f4 ff ff jmpq 3020 <_ZN8snmalloc9AllocatorINS_24MemoryProviderStateMixinINS_8PALLinuxEEENS_15DefaultChunkMapINS_21GlobalPagemapTemplateINS_11FlatPagemapILm24EhEEEEEELb1EXadL_ZNS_16lazy_replacementEPvEEE16small_alloc_slowILNS_7ZeroMemE0ELNS_12AllowReserveE1EEESA_m>
3ba4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
3bab: 00 00 00
3bae: 66 90 xchg %ax,%axThis looks quite a bit slower potentially at the moment. |
|
Okay, I have done a bit of maths, I think that we can simplify the whole piece of code to SNMALLOC_FAST_PATH static size_t aligned_size(size_t alignment, size_t size)
{
// Client responsible for checking alignment is not zero
assert(alignment != 0);
// Client responsible for checking alignment is not above SUPERSLAB_SIZE
assert(alignment <= SUPERSLAB_SIZE);
// Client responsible for checking alignment is a power of two
assert(bits::next_pow2(alignment) == alignment);
return ((alignment - 1) | (size - 1)) + 1;
}I think this should interact correctly with the sizeclass calculation. It would be great to add a validation test that this does the correct thing. Something similar to which should work except for 0. |
|
Cool! #include <iostream>
#include <random>
#include <snmalloc.h>
#include <test/setup.h>
NOINLINE
snmalloc::sizeclass_t size_to_sizeclass(size_t size)
{
return snmalloc::size_to_sizeclass(size);
}
NOINLINE
size_t sizeclass_to_size(snmalloc::sizeclass_t sc)
{
return snmalloc::sizeclass_to_size(sc);
}
NOINLINE
size_t aligned_size(size_t alignment, size_t size)
{
return snmalloc::aligned_size(alignment, size);
}
int main(int, char**)
{
setup();
bool failed = false;
std::random_device dev;
unsigned int seed = dev();
std::cout << "random seed: " << seed << std::endl;
std::uniform_int_distribution<size_t> dist(1);
std::mt19937 rng(seed);
for (snmalloc::sizeclass_t sz = 0; sz < snmalloc::NUM_SIZECLASSES; sz++)
{
// Separate printing for small and medium sizeclasses
if (sz == snmalloc::NUM_SMALL_CLASSES)
std::cout << std::endl;
size_t this_size = snmalloc::sizeclass_to_size(sz),
pre_size = (sz != 0) ? snmalloc::sizeclass_to_size(sz - 1) : 1;
for (size_t alignment = snmalloc::bits::next_pow2(pre_size);
alignment < this_size;
alignment = snmalloc::bits::next_pow2(alignment + 1))
{
std::cout << std::endl;
std::cout << "-- alignment: " << alignment << std::endl;
for (int i = 0; i < 100; ++i)
{
size_t size = dist(rng) % alignment;
std::cout << "---- size: " << size;
bool flag = sizeclass_to_size(size_to_sizeclass(aligned_size(alignment, size))) %
alignment !=
0;
std::cout << ((flag) ? " [FAILED]" : " [SUCCESS]") << std::endl;
failed |= flag;
}
}
}
if (failed)
abort();
}
|
|
@SchrodingerZhu, I would prefer something a bit more exhaustive. I think the following covers the core properties of the sizeclass calculation, and void test_align_size()
{
bool failed = false;
assert(aligned_size(128, 160) == 256);
for (size_t size = 1; size < sizeclass_to_size(NUM_SIZECLASSES - 1); size++)
{
size_t rsize = sizeclass_to_size(size_to_sizeclass(size));
if (rsize < size)
{
std::cout << "Size class rounding shrunk: " << size << " -> " << rsize
<< std::endl;
failed |= true;
}
auto lsb_rsize = rsize & -rsize;
auto lsb_size = size & -size;
if (lsb_rsize < lsb_size)
{
std::cout << "Original size more aligned than rounded size: " << size
<< " (" << lsb_size << ") -> " << rsize << " (" << lsb_rsize
<< ")" << std::endl;
failed |= true;
}
for (size_t alignment_bits = 0; alignment_bits < SUPERSLAB_BITS;
alignment_bits++)
{
auto alignment = (size_t)1 << alignment_bits;
auto asize = aligned_size(alignment, size);
if (asize < size)
{
std::cout << "Shrunk! Alignment: " << alignment << " Size: " << size
<< " ASize: " << asize << std::endl;
failed |= true;
}
if ((asize & (alignment - 1)) != 0)
{
std::cout << "Not aligned! Alignment: " << alignment
<< " Size: " << size << " ASize: " << asize << std::endl;
failed |= true;
}
}
}
if (failed)
abort();
}In particular, the sizeclass rounding, should never set lower bits, and the alignment rounding should |
|
The code gen for this change is really nice, clang gives: 0000000000000000 <rust_alloc>:
0: 48 89 f8 mov %rdi,%rax
3: 64 48 8b 3c 25 00 00 mov %fs:0x0,%rdi
a: 00 00
c: 48 83 c0 ff add $0xffffffffffffffff,%rax
10: 48 83 c6 ff add $0xffffffffffffffff,%rsi
14: 48 09 c6 or %rax,%rsi
17: 48 81 fe ff ff 00 00 cmp $0xffff,%rsi
1e: 77 1c ja 3c <rust_alloc+0x3c>
20: 48 83 e6 f8 and $0xfffffffffffffff8,%rsi
24: 48 8b b6 00 00 00 00 mov 0x0(%rsi),%rsi
2b: 48 8b 04 f7 mov (%rdi,%rsi,8),%rax
2f: 48 85 c0 test %rax,%rax
32: 74 11 je 45 <rust_alloc+0x45>
34: 48 8b 08 mov (%rax),%rcx
37: 48 89 0c f7 mov %rcx,(%rdi,%rsi,8)
3b: c3 retq
3c: 48 83 c6 01 add $0x1,%rsi
40: e9 00 00 00 00 jmpq 45 <rust_alloc+0x45>
45: e9 00 00 00 00 jmpq 4a <rust_alloc+0x4a>
4a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)I think this will be pretty fast. |
Co-authored-by: mjp41 <mattpark@microsoft.com>
mjp41
left a comment
There was a problem hiding this comment.
I think you need a check for zero alignment in memalign inside malloc.cc as our libc tests expect align 0 to work.
Great, work. Thanks
CMakeLists.txt
Outdated
| add_library(${name} ${type} ${ARGN}) | ||
| target_link_libraries(${name} snmalloc_lib) | ||
| if(MSVC) | ||
| target_compile_definitions(${name} PRIVATE "SNMALLOC_EXPORT=__declspec(dllexport)") |
There was a problem hiding this comment.
I am not sure what is required to link this into Rust. As this is a static lib, I would have said it isn't needed. @snf any thoughts?
Are these requirements treated correctly in current method? |
I believe so. Well, as much as they were before. We don't meet the posix_memalign spec when alignment > SUPERSLAB_SIZE. I intend to fix this later. |
|
I noticed the codegen for dealloc where a size is supplied is not as good as it could be. I'll submit a separate PR refactoring that. |
|
Unless you want to add something more, I'm happy to merge this when it passes CI. |
|
I think it is ready then. |
This pr tracks the work of adding a static linking target for rust support.
My suggestion is that the merging should be considered after the discussions in #109 and rust-lang/rust#68381 are finished.